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Abstract 

Failure Modes and Effects Analysis contain a wealth of information that 
can be used to create the knowledge base required for building automated 
diagnostic Expert systems. A real time monitoring and diagnosis expert 
system based on an actual NASA project's matrix failure modes and effects 
analysis was developed. This Expert system was developed at NASA Ames 
Research Center. This system was first used as a case study to monitor the 
Research Animal Holding Facility (RAHF), a Space Shuttle payload that is 
used to house and monitor animals in orbit so the effects of space flight and 
microgravity can be studied. The techniques developed for the RAHF 
monitoring and diagnosis Expert system are general enough to be used for 
monitoring and diagnosis of a variety of other systems that undergo a Matrix 
FMEA. This automated diagnosis system was successfully used on-line and 
validated on the Space Shuttle flight STS-58 , mission SLS-2 in October 1993. 


Introduction 

Formal reliability analyses, such as fault tree, digraph, or failure modes and 
effects (FMEA) analyses, are performed on many engineered systems. These 
analyses contain a wealth of information that can be used to help build 
automated diagnostic systems. A significant amount of effort can be saved by 
using reliability analysis information to build a diagnostic system since much 
of the knowledge engineering required for such a system will be done by the 
system engineers while performing the reliability analysis. 

A real time monitoring and diagnosis system based on a matrix failure 
modes and effects analysis (FMEA) has been developed at NASA Ames 
Research Center. This system will be used with the Research Animal 
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Holding Facility (RAHF). The RAHF is a Space Shuttle Spacelab module 
designed at NASA Ames Research Center that is used to house and monitor 
animals in orbit so the effects of space flight and microgravity can be studied. 
A detailed matrix FMEA was initially performed on the RAHF. Subsequently 
a monitoring and diagnosis system based on the information contained in the 
RAHF matrix FMEA analysis was used to automatically monitor RAHF 
telemetry and help to detect, identify, diagnose, and repair problems that may 
occur in the RAHF system. The verification and validation of this system was 
successfully completed during the Space Shuttle flight 58, Spacelab Life 
Sciences-2 in October of 1993. 

Matrix Failure Modes and Effects Analysis 
Matrix failure modes and effects analysis (matrix FMEA) is a systematic 
method for tracing the effects of piece part failures on the overall system. A 
matrix FMEA decomposes the system into hierarchical indenture levels. The 
top indenture level represents the entire system. Lower indenture levels 
represent subsystems of the indenture level immediately above them. For 
each indenture level, the possible failures for each part and the effects of those 
failures are listed. The matrix FMEA traces the failure effects from lower 
indenture levels to higher indenture levels. Effects from lower indenture 
levels are treated as failures in the next higher level. Figure 1 illustrates the 
indenture level buildup in a matrix FMEA. Each horizontal row of a matrix 
represents possible failures of a given part, and the vertical columns show the 
effects of those failures [Ref. 1,2,3,4,51. The diagram shows how the matrix 
FMEA traces failures from the lowest indenture level (circuit) to find their 
effects in the highest level (system). 
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Figure 1 : Matrix FMEA 
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Matrix FM EA Based Diagnosis 

Matrix FMEA analysis organizes system failure information into 
structured cause and effect relationships. This structure provides a mapping 
from the matrix FMEA into a set of diagnostic rules [Ref. 6]. Rules are derived 
from the matrix FMEA by treating each failure at a given indenture level as a 
rule antecedent and each effect of that failure as a rule consequent. 

This technique was used to build a RAHF diagnostic knowledge base for 
use with the Fault Tree Diagnosis System (FTDS) developed at NASA Ames 
Research Center. FTDS performs diagnostic reasoning using a fault tree 
reliability model as a knowledge base [Ref. 7]. The acyclic graph structure of 
the rule base produced by a matrix FMEA matches the fault tree structure 
required by FTDS. 

FTDS bases its diagnoses on a record of normal and abnormal indicators 
for the system. It builds hypothesis constraint sets from the normal indicators 
and determines the basic causes of the abnormal indicators using a heuristic 
backward chaining search. FTDS is well suited for this telemetry monitoring 
task since the information it requires can be automatically recorded by the 
telemetry stream monitor. 

The RAHF matrix FMEA analysis included additional information about 
failure detection methods, failure criticality, and failure recovery that proved 
very useful in the development of the monitoring and diagnosis system [Ref. 
5]. The failure detection methods were used to develop the telemetry 
monitoring software, described in the next section, that connects the diagnosis 
system to the RAHF hardware. The matrix FMEA failure criticality and 
recovery information was used by FTDS to provide criticality data and 
corrective actions for any diagnosed failures or performance anomalies 
(Figure 2). 

Telemetry Monitoring 

The RAFIF telemetry data was automatically monitored in real time and 
displayed to payload operations controllers on color graphic displays. The 
monitoring system scand for any anomalies in the telemetry stream. 

The RAHF matrix FMEA included detection methods for most of the 
effects listed in the analysis. These detection methods formed the basis of the 
telemetry monitoring system. The RAHF telemetry stream included 
temperatures and pressures in the RAHF, environmental control system 
status (e.g., heaters on or off), animal data (activity and water consumption) 
and various alarms triggered by sensors throughout the RAHF. The detection 
methods listed in the matrix FMEA allowed the relationship of effects of 
failures to these telemetry data points. For example, a detection method 
given to detect a leak in the animal drinking water supply system is to check 
for abnormally high water consumption counts for a given animal. If any of 
the abnormal conditions outlined in the matrix FMEA were detected in the 
telemetry stream, the information was be passed to the diagnosis system to 
find the cause of the problem. 
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Figure 2: Augmented FTDS Algorithm 


System Integration 

The detection methods listed in the RAHF matrix FMEA have been added 
to the RAHF diagnostic knowledge base as consequents of the failure effect 
they detect (if effect then detect-method) . This allows FTDS to reason 
backward from the detection method consequent to find the cause of the 
anomaly. If the monitoring system detects any anomalies in the RAHF 
telemetry stream, those anomalies will be sent to FTDS as abnormal 
indicators. FTDS will diagnose the cause of the anomalies and return the 
suspected failures, their criticality, and a list of corrective actions to overcome 
the failure. 

The resulting diagnosis system is refereed to as the RAD System (Real- 
Time Automated Diagnosis). The RAD was installed on a computer 
workstation in the payload telemetry monitoring area during the flight of the 
RAHF in October 1993. During flight operations the workstation displayed 
trend graphs of RAHF information, such as temperature, humidity, and 
animal activity. When anomalies occured during RAHF system operation, 
the payload operations staff was immediately notified by the RAD System 
(with audio and visual alarms) and a diagnosis screen appeared on the 
workstation that presented the results of the FTDS diagnosis and 
recommended corrective actions. Provisions were be made to allow diagnosis 
of conditions not covered by the telemetry (e.g., the Space Shuttle crew reports 
that the lights in the RAHF did not light when the daytime simulation was 
scheduled to start). 
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Conclusions 

This research produced general and specific techniques for using 
traditional reliability models for automated diagnosis. By using these 
techniques an automated monitoring and diagnosis system like "The RAD" 
can be produced for any system that undergoes a matrix FMEA analysis. This 
could save a great deal of time and expense that would otherwise be devoted 
to knowledge acquisition and knowledge engineering activities. By using the 
matrix FMEA Methodology all of the performance issues were addressed, as 
well as all of the safety issues. Since the matrix FMEA contains performance 
information, the diagnostic system can monitor for performance anomalies 
as well as severe failures. The development of the RAD System has also 
provided insight into how reliability analysts can augment their analyses 
with detection methods, corrective actions and severity ratings to further 
facilitate diagnostic reasoning and failure recovery. 
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